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Abstract. The search for neutrinoless double beta decay is a very active field in which 
the number of proposals for next-generation experiments has proliferated. In this paper 
we attempt to address both the sense and the sensitivity of such proposals. Sensitivity 
comes first, by means of proposing a simple and unambiguous statistical recipe to derive the 
sensitivity to a putative Majorana neutrino mass, mpp. In order to make sense of how the 
different experimental approaches compare, we apply this recipe to a selection of proposals, 
comparing the resulting sensitivities. We also propose a "physics-motivated range" (PMR) of 
the nuclear matrix elements as a unifying criterium between the different nuclear models. The 
expected performance of the proposals is parametrized in terms of only four numbers: energy 
resolution, background rate (per unit time, isotope mass and energy), detection efficiency, 
and /3/3 isotope mass. For each proposal, both a reference and an optimistic scenario for 
the experimental performance are studied. In the reference scenario we find that all the 
proposals will be able to partially explore the degenerate spectrum, without fully covering it, 
although four of them (KamLAND-Zen, CUORE, NEXT and EXO) will approach the 50 meV 
boundary. In the optimistic scenario, we find that CUORE and the xenon-based proposals 
(KamLAND-Zen, EXO and NEXT) will explore a significant fraction of the inverse hierarchy, 
with NEXT covering it almost fully. For the long term future, we argue that 136 Xe-based 
experiments may provide the best case for a 1-ton scale experiment, given the potentially 
very low backgrounds achievable and the expected scalability to large isotope masses. 
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1 Introduction 

Physicists invent, build and run experiments to search for new phenomena. In order to 
convince the funding agencies to support their research, they write proposals in which they 
estimate the sensitivity of their experiments. The definition of sensitivity is somewhat per- 
verse: rather than promising a discovery, the proponents of a new experiment assume that it 
will fail to find a signal, and attempt to demonstrate that, in this worst-case scenario, they 
will exclude a larger portion of the landscape of physical parameters (such as a cross section 
or a lifetime) than other proposals. Loosely speaking, the sensitivity allows to compare the 
physics case of competing experimental approaches. 

An area in which the proposals for new experiments have proliferated during the last 
decade is the search for neutrinoless double beta decay (f3f30v) — see, for instance, [1, 2] 
and references therein. The detection of such a process would establish that neutrinos are 
Majorana particles [3, 4] (that is to say, truly neutral particles indistinguishable from their 
antiparticles), and a measurement of the decay rate would provide direct information on neu- 
trino masses [5] . The observation of neutrino oscillations [6] — which implies that neutrinos 
have a non-zero mass, an essential condition for /3/30z^ to exist — and the possible evidence 
of a f3f30v signal in the Heidelberg-Moscow experiment [7] have boosted the interest in the 
double beta decay searches, and prompted a new generation of experiments with improved 
sensitivity. In spite of the formidable experimental challenge (or because of it) the field is 
rich in new ideas. Among the newcomers one finds many experimental approaches, as well 
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as several source isotopes. All of them claim to be capable of reaching a sensitivity to an 
effective Majorana neutrino mass (defined in Sec. 2) of at least 100 meV, and many advertise 
a second phase in which the sensitivity can be improved to some 20 meV. 

However, considerable confusion can happen because the different proposals often use 
different recipes to compute the sensitivity. Also, similar sensitivity results hide sometimes 
more or less credible detector performance assumptions, concerning, for example, energy res- 
olution or radiopurity. An additional complication arises when comparing the sensitivity of 
experimental approaches using different isotopes. In such a case, a large theoretical uncer- 
tainty exists in relating the decay rate sensitivity of a proposal to the effective Majorana 
neutrino mass sensitivity (see Sec. 2). And to further blur the picture, the scalability, cost, 
and schedules of the proposed experiments vary greatly. 

In this paper we attempt to address both the sense and the sensitivity of the neutrinoless 
double beta decay experimental approaches currently proposed. In Section 2 we discuss 
the relationship between the rate of /3/30i/, neutrino masses and nuclear theory, motivating 
our assumptions for the latter. Section 3 briefly reviews the experimental aspects of /3/30f 
searches. In Section 4 we discuss in detail our definition of sensitivity and show the problems 
that arise when the observed number of events is small compared to the expected background 
rate. Next, the Feldman- Cousins unified approach [8] is presented and is applied to a (3/30u 
experiment with background (we give more details of this procedure in Appendix A) . Section 
5 describes the main features of the experiments taken into consideration, defining a plausible 
range of detector performance indicators for each proposal. We try to make sense of these 
proposals in Section 6, where the corresponding tjibb sensitivities are compared as a function 
of exposure (kg • year) and for different scenarios. We briefly address also the question of 
scalability to large isotope masses. 

2 Rate of neutrinoless double beta decay 

Neutrinoless double beta decay is a very rare nuclear transition that occurs if neutrinos are 
massive Majorana particles [3, 4]. It involves the decay of a nucleus with Z protons into 
a nucleus with Z + 2 protons and the same mass number A, accompanied by the emission 
of two electrons: (Z,A) —> (Z + 2, A) + 2 e~ . The sum of the kinetic energies of the two 
emitted electrons is always the same, and corresponds to the mass difference between mother 
and daughter nuclei, Qbb- The decay violates lepton number conservation and is therefore 
forbidden in the Standard Model. 

The simplest underlying mechanism of /3j30u is the virtual exchange of light (m Vi < 10 
MeV) Majorana neutrinos, although, in general, any source of lepton number violation (LNV) 
can induce /3/30v and contribute to its amplitude. If we assume that the dominant LNV 
mechanism at low energies is the light-neutrino exchange, the half-life of f3/30v can be written 
as: 



where mi are the neutrino mass eigenstates and U e i are elements of the neutrino mixing 
matrix. 





(2.2) 
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Neutrino oscillation experiments constrain how the effective Majorana mass in Eq. 2.2 
changes with the absolute neutrino mass scale, defined as min{mi,m3}. Only upper bounds 
on the absolute neutrino mass, of order 1 eV, currently exist. Also, current neutrino oscilla- 
tion results cannot differentiate between two possible mass orderings, usually referred to as 
normal and inverted orderings. In the normal ordering case, the gap between the two lightest 
mass eigenstates corresponds to the small mass difference, measured by solar experiments. 
In this case, the effective Majorana mass can be as low as 2 meV [2]. If the spectrum is 
extremely hierarchical, and therefore the absolute neutrino mass can be neglected, map can 
be as high as 5 meV in the normal ordering case. In the inverted ordering case, the gap 
between the two lightest states corresponds to the large mass difference, measured by atmo- 
spheric experiments. In this case, mpp can be as low as about 15 meV [2]. The upper limit 
on nipp if the neutrino mass can be neglected is approximately 50 meV in this case. Finally, 
in the particular case in which the neutrino mass differences are very small compared with 
its absolute scale, we speak of the degenerate spectrum. In this case, larger values for mpp 
can be obtained, approximately above 50 meV. 

All nuclear structure effects in (3{30v are included in the nuclear matrix element (NME). 
Its knowledge is essential in order to relate the measured half-life to the neutrino masses, and 
therefore to compare the sensitivity and results of different experiments, as well as to predict 
which are the most favorable nuclides for {3{30v searches. Unfortunately, NMEs cannot be 
separately measured, and must be evaluated theoretically. 

In the last few years the reliability of the calculations has greatly improved, with several 
techniques being used, namely: the Interacting Shell Model (ISM) [9-11]; the Quasiparticle 
Random Phase Approximation (QRPA) [12-14]; the Interacting Boson Model (IBM) [15]; 
and the Generating Coordinate Method (GCM) [16]. Figure 1 shows the most recent results 
of the different methods. We can see that in most cases the results of the ISM calculations 
are the smallest ones, while the largest ones may come from the IBM, QRPA or GCM. 

Shall the differences between the different methods be treated as an uncertainty in 
sensitivity calculations? Should we assign an error bar to the distance between the maximum 
and the minimum values? This approach has, we argue, the undesirable effect of blurring the 
relative merits of different experimental approaches, and does not reflect the recent progress 
in the theoretical understanding of the treatment of nuclear matrix elements. 

Each one of the major methods has some advantages and drawbacks, whose effect in the 
values of the NME can be sometimes explored. The clear advantage of the ISM calculations 
is their full treatment of the nuclear correlations, while their drawback is that they may 
underestimate the NMEs due to the limited number of orbits in the affordable valence spaces. 
It has been estimated [19] that the effect can be of the order of 25%. On the contrary, the 
QRPA variants, the GCM in its present form, and the IBM are bound to underestimate the 
multipole correlations in one or another way. As it is well established that these correlations 
tend to diminish the NMEs, these methods should tend to overestimate them [9, 20]. 

With these considerations in mind, we propose here our best estimate to NME values 
obtained as the central value of a physics-motivated range (PMR) of theoretical values for 
76 Ge, 82 Se, 130 Te, 136 Xe and 150 Nd. In what follows we select the results of the major nuclear 
structure approaches which share the following common ingredients: (a) nucleon form factors 
of dipole shape; (b) soft short range correlations computed with the UCOM method [21]; (c) 
unquenched axial coupling constant gA; (d) higher order corrections to the nuclear current 
[22]; and (e) nuclear radius R = r$ A 1 / 3 , with ro = 1.2 fm [23]. Therefore, the remaining 
discrepancies between the diverse approaches are solely due to the different nuclear wave 
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Figure 1. Recent NME calculations from the different techniques (GCM [16], IBM [15], ISM [10, 11], 
QRPA(J) [12], QRPA(T) [14, 17, 18]) with UCOM short range correlations. All the calculations use 
gA = 1.25; the IBM-2 results are multiplied by 1.18 to account for the difference between Jastrow 
and UCOM, and the RQRPA are multiplied by 1.1/1.2 so as to line them up with the others in their 
choice of ro=1.2 fm. The shaded intervals correspond to the proposed physics-motivated ranges (see 
text for discussion). 

functions that they employ. 

Let's start with the 150 Nd case, for which no ISM value is available. The GCM cal- 
culation [16] is clearly the most sophisticated in the market from the point of view of the 
nuclear structure, and gives the smaller NME. The two other approaches, QRPA [18] and 
IBM [15], give larger and similar results; therefore we average them both to propose a NME 
range [1.71 — 2.95], and therefore a NME best estimate of 2.33. 

For 136 Xe we have an ISM value which defines the lower end of the range . For the 
upper one, we average the NMEs from the RQRPA calculation of the Tubingen group [14], 
the GCM [16], the IBM and the more recent pnQRPA result from the Jyvaskyla La Plata 
collaboration [12]. The resulting interval is [2.19 — 3.45], and our most probable value is 2.82. 

With the same ingredients we obtain a range [2.65 — 4.61] for 130 Te, corresponding to a 
best estimate of 3.63. 

For 82 Se the interval is [2.64-4.32] using the latest SRQRPA results [17], and the NME 
most probable value is 3.48. 

Finally we come to 76 Ge where we can use an extra filter, namely, to demand that the 
calculations must be consistent with the occupation numbers measured by J. Schiffer and 
collaborators [24, 25]. This leaves us with the ISM [11], the SRQRPA, [17] and the pnQRPA 
[12]. Averaging again the two QRPA values we obtain the interval [3.26 — 4.87], and a best 
estimate of 4.07. 



1 We could have increased this value in view of the above discussion, but we shall refrain for doing so and 
use just published values. 
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Isotope 


W 


Q/3/3 


\M 0v \ 


GW 1 




[mpp = 50 meV) 


N 0v /N 0v {Ge) 




(g/mol) 


(keV) 




(10 25 y eV 2 ) 


' (10 27 y) 




76 Ge 


75.9 


2039 


4.07 


4.09 




0.95 


1.0 


82 Se 


81.9 


2996 


3.48 


0.93 




0.26 


3.3 


130 Te 


129.9 


2528 


3.63 


0.59 




0.18 


3.1 


136 Xe 


135.9 


2458 


2.82 


0.55 




0.25 


2.1 


150 Nd 


149.9 


3368 


2.33 


0.13 




0.15 


3.3 



Table 1. Physical properties of different isotopes considered in this paper: atomic weight, W [26]; 
/3f3 decay Q- value [27-31]; the PMR for the nuclear matrix element (|M 0i ,|, see text for discussion); 
inverse of /3/3 decay phase-space factor (|Goj,| -1 , from [32]). For illustrative purposes, we also give 
the f3/30v half-life for a fixed mpp value (Ty 2 (mRR = 50 meV)), and the expected number of isotope 
(3/30v decays relative to 76 Ge /3/30v decays for a fixed isotope mass (N 0l ,/N „(Ge)). 

The properties of the five isotopes relevant to our work, including our proposed values 
for the NMEs, are shown in Table 1. For the sake of simplicity and for the above-mentioned 
intended focus on directly comparing different experimental approaches, the majority of the 
results in this work rely only on our best estimate for NME values. On the other hand, 
we acknowledge that some uncertainty in NME calculations still exist, and that our recipe 
for estimating NME most probable values is by no means unique. For this reason, we also 
tabulate in Sec. 6.3 our final mgs results separately for all NME calculations considered 
above. 

3 Experimental aspects 

The choice among f3f3 isotopes discussed in Sec. 2 is not the only important factor to be 
considered when optimizing future /3/30f proposals. According to Table 1, one should prefer, 
everything else being the same, experiments based on 150 Nd, 82 Se, 130 Te, or even 136 Xe, 
rather than experiments based on 76 Ge. However, Ge-based experiments have dominated 
the field so far. 

The reason is that f5f5Qv experiments must be designed to measure the kinetic energy 
of the electrons emitted in the decay. Due to the finite energy resolution of any detector, 
/3/30f events are reconstructed within a non-zero energy range centered around Qpp, typically 
following a gaussian distribution. As will be demonstrated in Sec. 6, any background event 
falling in this energy range limits dramatically the sensitivity of the experiment. Good energy 
resolution is therefore essential. Germanium semiconductor detectors provide the best energy 
resolution achieved to date: in a 76 Ge experiment, a region able to contain most of the signal 
— called the region of interest (ROI), and often taken as 1 FWHM around Qrr — would be 
only a few keV wide. 

Unfortunately, energy resolution is not enough by itself: a continuous spectrum arising 
from natural decay chains can easily overwhelm the signal peak, given the enormously long 
decay times explored. Consequently, additional signatures to discriminate between signal 
and backgrounds, such as event topology or daughter ion tagging, are desirable. Also, the 
experiments require underground operation and a shielding to reduce external background 
due to cosmic rays and surrounding radioactivity, and the use of very radiopure materials. In 
addition, large detector masses, high (3f3 isotope enrichment, and high /3/3 detection efficiency 
are clearly desirable, given the rare nature of the process searched for. No experimental 



- 5 - 



technique scores the highest mark in all of the above, and thus different approaches are 
possible. 

The Heidelberg-Moscow (HM) experiment [33], using high-purity germanium diodes 
enriched to 86% in the isotope 76 Ge, set the most sensitive limit to date: Ty 2 ( 7e Ge) > 
1.9 x 10 25 years (90% CL). The experiment accumulated a total exposure of 71.7 kg • y, 
and achieved a background rate in the ROI of 0.06 counts/ (keV • kg • y) after pulse shape 
identification. The energy resolution (FWHM) at Qgg was 4.23 ± 0.14 keV. A subset of the 
Collaboration observed evidence for a /3/30f signal [7]. The claim has been severely questioned 
[34], but no one has been able to prove it wrong. According to it, the isotope 76 Ge would 
experiment /3/30z^ decay with a lifetime of about 1.5 x 10 25 years. Using the PMR nuclear 
matrix element, this corresponds to a neutrino mass of about 0.4 eV. 

In this paper, we compare the mag sensitivity of different experimental approaches by 
parametrizing in a simple and intuitive way their detector performance, as will be discussed 
in Sec. 5. Before that, in Sec. 4, we discuss the statistical recipe that we will adopt to derive 
the experimental sensitivities. 



4 A unified treatment of sensitivity estimates of j3j30v> experiments 
4.1 An ideal experiment 

All f3f3Qv experiments have to deal with non-negligible backgrounds, an only partially efficient 
P/30u event selection, and more or less difficulties to extrapolate their detection technique 
to large masses. It is instructive, however, to imagine an ideal experiment defined by the 
following parameters: (a) the detector mass is 100% made of a j3j3 source; (b) perfect en- 
ergy resolution, and/or infinite radiopurity, resulting in a null background rate; (c) perfect 
detection efficiency; and (d) scalable to large masses at will, that is, exposure as large as 
desired. 

Suppose that such an experiment runs for a total exposure Mt. The expected number 
of f3/30v events is given by 

where Na is the Avogadro constant, W is the atomic weight of the /3/3-decaying isotope, e is 
the signal detection efficiency, M is the /3/3 isotope mass and t the data-taking time. 

Different identical experiments running for the same total exposure would observe a 
different number of events, n, with expectation value \i = N. The probability distribution 
function (pdf) of n is a Poisson distribution: 

Po(n; n) = ^ e-i* . (4.2) 

For example, Figure 2 (left) shows a Poisson distribution with [i = 5. If we run a large number 
of identical experiments, only 17.5% will observe 5 events. In fact, the same percentage will 
observe 4 events, and a small but non-null number of experiments (7 in a thousand) will 
observe zero events. Consider now an experiment whose outcome is n D b s = 4. What can be 
inferred about the true value /i from this measurement? 

A common way to report a result on //, proposed by J. Neyman in 1937 [35], is to 
give a confidence interval (CI) where fi is likely to be included. How likely an interval is 
to contain the true value is determined by the confidence level (CL), usually expressed as a 
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percentage. The end points of the CI, called the confidence limits, are defined in terms of a 
probability: we define the lower (upper) limit /ti (/t U p) as the value of the parameter \x such 
that, if we carry a large number of experiments following the Poisson distribution Po(n; /ti ) 
(Po(n; // U p))j then a fraction a (/?) of them at most will yield a number larger (smaller) than 
or equal to n Q ^ s . Therefore, to compute the limits the following equations can be solved 
numerically: 



a 

P 



E^ obs Po(«; Wo ) 
Eo° bs Po(n;/i up ) 



(4.3) 
(4.4) 



Figure 2 (right) illustrates the above definition. The lower and upper limits are com- 
puted for n b s = 4 and a = j3 = 0.05. The filled (blue) circles represent Po(n; fj,\ = 1.37), 
while the filled (red) squares represent Po(n; /i up = 9.15). Adding the filled circles corre- 
sponding to n < 4 yields a probability of 0.95 (therefore, the integrated probability to find 
n- > "-obs = 0.05). Adding the filled squares corresponding to n > 4 yields a probability of 
0.95 (therefore, the integrated probability to find n < n b s = 0.05). Notice that our choice of 
a symmetric interval to cover the true value /i = 5 is not especially well motivated. We could 
have decided to set asymmetric intervals, for example, a = 0.07 and (3 = 0.03 and would 
still have a 90% CL. Nevertheless, this is the most common choice for two-sided confidence 
intervals. 

Consider now another experiment: an unlucky team runs under identical conditions than 
those of the previous experiment, but finds n ^ s = 0. This is an unlikely but by no means 
impossible outcome, with a probability of 0.7%. What should the team report? Clearly a 
lower limit cannot be found, but an upper limit can still be determined by setting n D b s = 
in Eq. (4.4): 

/i up = -log(/3) (4.5) 

For /3 = 0.1 (that is, for 90% CL), Equation (4.5) yields /i up ~ 2.3, and for (3 = 0.05 (95% 
CL), we find /t up ~ 3. The meaning of the upper limit is very clear. In a Poisson distribution 
with /x = 2.3, one observes 10% of the times n = 0, while for [x = 3.0, one observes 5% of the 
times n = 0. 



«pbs 



6 8 
n 



6 8 
n 



10 12 14 



Figure 2. Left: a Poisson distribution with /i = 5. The probability of observing 4 events is 17.5%, 
which means that only that fraction of experiments will yield a number of events equal to the ex- 
pectation value (i. Right: lower (/ti ) and upper (/i U p) limits computed for an observed number of 
events n b s equal to 4, and a = [3 = 0.05. Two Poisson distributions are shown: Po(n; \x — /j,\ ) and 
Po(n;/t = u up ). 
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Figure 3. Sensitivity of ideal experiments at 90% CL for different f3/3 isotopes. Since the yields are 
very similar, the sensitivities of 82 Se, 130 Te and 150 Nd overlap. 

Therefore, an ideal experiment that observes no events after a given exposure would 
report a negative result, that is, an upper limit in the /3/30V decay rate (T-P^) , or possibly 
in the more relevant physical parameter 171/3/3. Prom Equations (2.1) and (4.1), the latter 
upper limit can be written as: 



where K\ is a constant that depends only on the isotope type. Substituting N by 2.3 (3), 
one gets the upper limit at 90% (95%) CL, which improves with y/l/Mt. 

In the following, we define the sensitivity of our experiments as the average upper limit 
one would get from an ensemble of experiments with the expected background and no true 
signal. 

For an ideal experiment, the expected background is exactly 6 = 0, resulting in no 
Poisson fluctuations in the observed number of signal plus background events for the exper- 
iments within the ensemble, which has to be always equal to zero in this case. Therefore 
the sensitivity of the experiment is in this case simply given by the upper limit reported by 
the unlucky experiment. Figure 3 shows the sensitivity of the five isotopes considered. Even 
ideal experiments need a large exposure (>10 3 kg • y) to fully explore the inverse hierarchy. 
To start exploring the normal hierarchy, one would need of the order of 10 4 kg • y, that is, 
perfect detectors of 1 ton running for 10 years: a truly formidable experimental challenge. 

4.2 Experiments with background and the collapse of the classical limit 

Consider now the more realistic case of an experiment with background. Call b the expected 
value of the background, and assume (unrealistically) that it is known with no uncertainty. 
The relevant pdf will then be: 




(4.6) 
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Figure 4. The collapse of the classical method: as the number of observed events becomes smaller, 
the Poisson distribution has to shift left to guarantee that the cumulative percentage to the left of 
n bs gets to 5%. In the extreme case of n i, s = 0, the expectation value that yields 5% of the events 
in n = 0, is /i + b = 3. Since b — 5, this forces /i up = —2, an absurd result. 



where /i is the (unknown) mean signal expectation and b is the known mean background 
expectation. The Poisson variable n is such that n = n s + n&, where the signal and back- 
ground Poisson variables n s and n& have mean expectation values E[n s ] = and Efn^] = 6, 
respectively. 

A priori, it appears as if we could treat the problem in exactly the same way as for the 
case without background. A CI can be constructed using the following equations: 



P = J2o° ba ( ^ up | 6) " e~ ( ^ u P +6) (4.9) 



Let us work out an explicit example. Suppose that we run an experiment in which 
the predicted background is b = 5, and we observe n Q b s = 5 events. Then, Equation (4.9) 
becomes: 

p = y (Mup+5)n e -(^p+5). (4.io) 

Solving numerically this equation with f3 = 0.05 (95% CL) yields an upper limit of ^ up = 5.51. 
The lower limit is also readily computed, obtaining fjL\ = —3.03. Since \i is physically bounded 
to non-negative values, we set fi\ = 0. 

We now repeat the procedure for n & s = 4, 3, 2, 1 and 0, while keeping 6 = 5. In 
this case, we obtain fj, up = 4.15,2.75,1.29,-0.25 and —2, respectively. The reason for the 
decrease in /i up as n b s decreases can be understood by observing Fig. 4. As the number of 
observed events becomes smaller, the Poisson distribution has to shift left to guarantee that 
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Figure 5. Upper limit [i up as a function of the number of observed events n b s , estimated solving 
Eq. (4.9) for a background prediction 6 = 5. The limit does not make sense when the observed number 
of events is small compared with the expected background (n ohs <C b). 

the cumulative percentage to the left of n b s gets to 5%. The situation is summarized in 
Fig. 5: due to the discreetness of the Poisson distribution, the "classical upper limit" method 
fails to provide meaningful results (i.e., it yields negative upper limits) when the observed 
number of events is small compared with the expected number of background events. 

4.3 The unified approach and sensitivity of an experiment with background 

The collapse of the classical limit was solved in 1997 by G. Feldman and R. Cousins. In 
their best-seller paper, "Unified approach to the classical statistical analysis of small signals" 
[8], they introduce a method for the construction of confidence intervals using an ordering 
principle based on likelihood ratios. The procedure avoids the collapse described in Sec. 4.2, 
unifying the treatment of upper confidence limits for null results and two-sided confidence 
limits for non-null results. Since its publication, the unified approach method has become 
the "de facto" standard frequentist approach to compute confidence intervals. We give some 
details of the method in the Appendix A. 

We revisit now the concept of sensitivity in the case of an experiment with background. 
Recall that by sensitivity we mean the average upper limit that would be obtained by an 
ensemble of identical replicas of such an experiment, each one with the same mean expected 
background and no true signal. Let us now translate the above definition into a mathematical 
formula. We define lA{n\b) as the function yielding the (unified approach) upper limit (at 
the desired CL) for a given observation n and a mean predicted background level b. Values 
for U{n\b) are reported in tabular form in [8] for several CL values. Given that the variable 
n follows a Poisson pdf, Po(ra|6) = Po(n; b), then, according to our definition, the sensitivity 
S(b) is given by: 

oo 

S(b) = E[U(n\b)} = ^Po(n|6) U(n\b). (4.11) 

n=0 

This equation reads as follows: the sensitivity S{b) of an experiment expecting b events of 
background is obtained by averaging the upper limits obtained using the unified approach 
(U(n\b)) with the likelihood of the individual observations (Po(n|6)). 
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Figure 6. Left: 90% CL sensitivity curve as a function of mean background prediction b as obtained 
according to Eq. (4.11) (black dots) or by simply assuming n — b (red dots). Right: sensitivity curve 
as a function of b for 90% and 95% CL. 

Let us work out an example. Assume that an experimental team proposes a 100 kg 
germanium detector to run underground for 10 years. The team claims that the expected 
background rate in the experiment is c = 0.001 counts/(keV • kg • y). Their energy ROI has a 
width of 5 keV. The predicted background is then b = 0.001 x 100 x 5 x 10 = 5 events. If we 
set \x = (no signal) and b = 5 in Eq. (4.7), we obtain the same distribution as for the case 
(/i = 5, b = 0) described in Sec. 4.1. If we throw many experiments following Po(n; fi = 5), 
the fraction of the times that W6 will obtain ti q \) S = 0,1,2,3..., is therefore described by 
the pdf shown in the top panel of Fig. 2. Consider for instance the case n = 5. The upper 
limit at 90% CL following the prescription of the unified approach is W(5|5) = 4.99 (see [8]); 
we also know that Po(5|5) = 0.175 (see Fig. 2). Analogously, for n = 4, U(4\5) = 3.66 and 
Po(4|5) = 0.175; for n = 3, W(3|5) = 2.78 and Po(3|5) = 0.14; and so forth. The sensitivity 
of the experiment, computed as the average of all the experiments, would then be: 

oo 

5(5) = ^Po(n|5) U(n\5) 

n=0 

= 4.99 x 0.175 + 3.66 x 0.175 + . . . = 5.17 

This value, 5(5) = 5.17 (90% CL) is different than the one sometimes used as upper limit, 
U(5\5) = 4.99. As illustrated in the left panel of Fig. 6, the difference between the 5(6) 
and U{b\b) sensitivity estimates is small but significant for all b. The latter approach is not 
strictly correct, introduces fluctuations, and underestimates the sensitivity limit for all b. 
The right panel in Fig. 6 compares the 90% with the 95% CL sensitivity curves obtained 
with our sensitivity procedure. 

We should note that, in the large background approximation, the sensitivity curve as a 
function of b follows the expected classical limit: 

A*up = 5(6) ~ a ■ Vb, for large b, (4-12) 

where a = 1.64 (1.96) at 90% (95%) CL. 

If we substitute N in Eq. (4.6) by the upper limit obtained using the unified approach, 
then, in the limit of large background: 



(4.13) 



where K2 is a constant depending on the isotope. If the background b is proportional to the 
exposure Mt and to the width AE of the ROI: 



with the background rate c, in counts/ (keV • kg • y), constant across the energy ROI, then: 



That is, in the limit of large background, the sensitivity to mpp improves very slowly with 
exposure, as (Mt)~*. 

In this paper, we compare the sensitivity of several experiments using the unified ap- 
proach to compute upper limits after Equation (4.11). As it was the case for our choice of the 
NME, this is a conservative but robust option. It is conservative because we do not use all 
the information available to compute upper limits (such as the potentially different energy 
distributions for the signal and the backgrounds in the ROI). It is robust, however, since the 
unified approach ensures coverage, unlike, for example, the profile likelihood method in the 
low background limit. Furthermore, it does not require the precise knowledge of the pdfs 
of signal and backgrounds in the ROI for the different experiments. While other methods 
can be used and many have been proposed in the literature (see, for example, [36]), we be- 
lieve that the use of the unified approach allows a simple (and therefore easily reproducible) 
calculation of sensitivities. As discussed in Sec. 3 and motivated here in the large exposure 
limit (but valid for all exposures) , proponents of an experiment simply need to provide three 
parameters describing the detector performance and the isotope to be used, in order to allow 
the derivation of the mpp sensitivity for a given exposure. The three parameters are: (a) the 
FWHM energy resolution AE, defining the energy ROI (of, say, 1 FWHM around Qpp) for 
the f3f30v search; (b) the background rate c per unit of (3(3 isotope mass, energy and time; (c) 
the (3(3 detection efficiency e. With this information, the sensitivity can be computed using 
Equation (4.11). 

5 A selection of proposals 

Many double beta decay experiments have been proposed in the last decade. A recent review 
[2] quotes 14 projects in different stages of development and using a variety of detection 
techniques and isotopes. In this paper we compare the relative merits of 7 of these proposals, 
chosen as representatives of four different experimental approaches. For each of the 7 exper- 
iments considered, we report in Table 2 the (3(3 isotope that will (likely) be used, together 
with the three parameters discussed in Sec. 4.3 that describe the detector performance. It is 
possible to find all these figures in the literature, although the level of detail in the discussion 
and the uncertainties associated with them vary greatly from case to case. 

Two scenarios have been considered regarding the background rates expected in the 
experiments: the reference (R) scenario, which implies in most cases an improvement of at 
least one order of magnitude over the state of the art represented by Heidelberg-Moscow, 
and the optimistic (O) scenario, which projects a further improvement with respect to the R 
scenario. In order to evaluate the scalability of the proposals to large masses (see Sec. 6.3), 
we give also a reference and an optimistic assumption for the (3(3 isotope mass. 

In the following, Sec. 5.1, a brief description of the proposals is given. The validity 
of the background rate assumptions, in light of the results achieved so far, is discussed in 
Sec. 5.2. 



b = c ■ Mt ■ AE 



(4.14) 




(4.15) 
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Experiment 


Isotope 


Resolution 

(% at Qpp) 


Efficiency 




Background rate 
(10~ 3 cts/(keV kg y)) 


Mass 
(kg) 


CUORE 


130 Xe 


0.18 


0.80 


R 
O 


40 
1 


200 
400 


EXO 


136 Xe 


3.3 


0.70 


R 
O 


1 

0.5 


160 
1000 


GERDA 


76 Ge 


0.16 


0.80 


R 
O 


10 
1 


15 

35 


KamLAND-Zen 


136 Xe 


9.5 


0.80 


R 
O 


0.5 
0.1 


360 
1000 


NEXT 


136 Xe 


0.7 


0.30 


T5 
X\ 




u.z 
0.06 


on 
1000 


SNO+ 


i50 Nd 


6.5 


0.50 


R 
O 


10 
1 


50 
500 


SuperNEMO 


82 Se 


4.0 


0.30 


R 
O 


0.4 
0.06 


7 

100 



Table 2. Proposals considered in the mpp sensitivity comparison. For each proposal, the isotope that 
will (likely) be used, together with estimates for detector performance parameters — FWHM energy 
resolution, detection efficiency and background rate per unit of energy, time and /3j3 isotope mass 
- are given. The efficiencies indicated here do not include the efficiency loss due to the 1 FWHM 
energy cut, common to all proposals. The background estimates and the /3/3 source mass include both 
a reference (R) and an optimistic (O) scenario. 

5.1 Brief description of the proposals 

The first experimental approach we consider is the high-resolution calorimetry. These 
are detectors characterized by excellent energy resolution. They have also the advantages of 
simplicity and compactness. The classic germanium diodes and the bolometers fall in this 
category. 

Two experiments, GERDA [37-40] and MAJORANA [41-43], will search for f3/30v in 
Ge using arrays of high-purity germanium detectors. This is a well-established technique 
that offers outstanding energy resolution (better than 0.2% FWHM at the Q-value) and 
high efficiency (~ 0.80). In its first phase GERDA expects a background rate of the order 
of 10~ 2 counts/(keV ■ kg • y). The Collaboration aims to improve this by an additional 
order of magnitude, 10~ 3 counts/(keV • kg • y) in the ROI, during its second phase. For 
its part, MAJORANA anticipates a background rate of 10~ 3 counts/ (ROLkg-y), that is, 
about 4 times less than the second phase of GERDA. For the sake of concreteness, we choose 
GERDA to represent the current germanium-based proposals, assigning its first phase - 
10~ 2 counts/ (keV • kg • y) and 15 kg of isotope mass — to the R scenario and its second phase 
— 10 -3 counts/ (keV • kg • y) and 35 kg — to the O scenario 2 . 

2 The GERDA and MAJORANA Collaborations plan to merge in the future for a 76 Ge experiment aiming 
for a ton-scale isotope mass and for a 10 -4 counts/ (keV • kg ■ y) background rate. Such third phase for future 
76 Ge detectors is considered beyond the scope of this work, focusing on experimental proposals for the coming 
decade. 
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CUORE [44-46] is an array of TeC>2 bolometers. Because 130 Te has a large nat- 
ural abundance (~ 34%), the need for enrichment is less important than in other iso- 
topes. CUORE can collect a large mass of isotope (~ 200 kg for a total detector mass 
of about 740 kg). The advantages of the technique are similar to those of germanium ex- 
periments with about the same energy resolution and efficiency for the signal. Studies have 
already demonstrated background rates at the 0.2 counts/ (keV • kg • y) level. We assign 

4 x 10~ 2 counts/(keV • kg • y) and 10 -3 counts/(keV • kg • y) to the reference and optimistic 
scenarios respectively. Concerning the mass, we assume the full 200 kg of CUORE in the R 
scenario, and a possible doubling (by enriching the target to some 70%) of the mass for the 
O scenario. 

The second approach considered is xenon time-projection chambers. Xenon is a 
suitable detection medium, providing both scintillation and ionization signals. It has a (3(3- 
decaying isotope, 136 Xe, with a natural abundance of about 10%. Compared to other (3(3 
sources, xenon is easy (thus relatively cheap) to enrich in the candidate isotope. It has no 
other long-lived radioactive isotopes, and no spallation products constitute a background. 
There are two possibilities for a xenon TPC: a cryogenic liquid xenon TPC (LXe), or a (high 
pressure) gas chamber (HPXe). We consider examples of both options here. 

The Enriched Xenon Observatory (EXO) [47, 48] will search for /3(30v decay in 136 Xe 
using a 200-kg liquid-xenon (enriched at 80% in 136 Xe) TPC during its first phase. The 
use of liquefied xenon results in a relatively modest energy resolution, 3.3% FWHM at Qpp 
[49] (using the anti-correlation between ionization and scintillation). The signal detection 
efficiency is estimated by the Collaboration to be about 70%. Background rates of order 
10~ 3 counts/ (keV • kg • y) are expected in EXO-200. The improvement with respect to the 
high-resolution calorimeters comes from the event topological information, that allows the 
rejection of superficial backgrounds and multi-hit events, so that only energetic gammas 
from 208 T1 and 214 Bi constitute a significant source of background. The optimistic scenario, 

5 x 10 -4 counts/ (keV- kg -y), projects a further improvement in the detector radiopurity. The 
ultimate goal of the EXO Collaboration is to develop the so-called barium tagging [50]. This 
technique would allow the detection of the ion product of the 136 Xe decay, and thus eliminate 
all backgrounds but the intrinsic (3(32v. For this study, however, we have not considered its 
benefits. Concerning the mass, we take 200 kg of Xe (160 kg of 136 Xe) for the reference 
scenario, and assume one ton of isotope for the optimistic scenario. 

The NEXT Collaboration is building a 100 kg high-pressure gaseous xenon (enriched 
at 90% in 136 Xe) TPC [51]. The experiment aims to take advantage of both good energy 
resolution (< 1% FWHM at Qpp) and the presence of a (3(3Qv topological signature for 
further background suppression. As a result, the background rate is expected to be one of 
the lowest among all the proposals considered here: 2 x 10~ 4 counts/(keV • kg • y) in the 
reference scenario, with a further improvement by a factor of 3 in the optimistic scenario. 
The low background rate, however, comes at the expense of a relatively inefficient signal 
detection: ~ 30%. In order to reach its ambitious goals, the NEXT Collaboration plans 
to rely on electroluminescence to amplify the ionization signal, using two separate photo- 
detection schemes for an optimal measurement of both calorimetry and tracking [52] . Like in 
the case of EXO we assume 100 kg of xenon (90 kg of (3(3 isotope) for the reference scenario 
and 1 ton for the optimistic scenario. 

The third category of experiments is large self-shielding calorimetry. These are 
large detectors in which the source, dissolved in liquid scintillator, is surrounded by a large 
buffer volume. The advantages of the approach are the self-shielding against external back- 
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grounds and the scalability, since it appears possible to dissolve large isotope masses. The 
main disadvantage is poor energy resolution. 

SNO+ [53-55] proposes to fill the Sudbury Neutrino Observatory (SNO) with ultra- 
pure liquid scintillator. A mass of several tens of kilograms of /3/3-decaying material can be 
added to the experiment by dissolving a neodymium salt in the scintillator. The natural 
abundance in the 150 Nd isotope is 5.6%. Given the liquid scintillator light yield and pho- 
tocathode coverage of the experiment, a modest energy resolution performance (about 6% 
FWHM at Qpp) is expected. External backgrounds can be rejected with a relatively tight 
fiducial volume selection, cutting about ~ 50% of the signal. A background rate similar to 
that of GERDA in its first phase is expected (~10~ 2 counts/(keV • kg • y) in the R scenario, 
improving by a factor 10 in the O scenario. Concerning the isotope mass, SNO+ has the dis- 
advantage of using natural neodymium, resulting in about 50 kg of isotope for the reference 
scenario. The optimistic scenario assumes that enriched neodymium can be use, resulting in 
a factor 10 increase of the mass. 

The KamLAND-Zen [56, 57] experiment plans to dissolve 400 kg of 136 Xe in the liquid 
scintillator of KamLAND in the first phase of the experiment, and up to 1 ton in a projected 
second phase. Xenon is relatively easy to dissolve (with a mass fraction of more than 3% 
being possible) and also easy to extract. The major modification to the existing KamLAND 
experiment is the construction of an inner, very radiopure (10~ 12 g/g of 238 U and 232 Th) 
and very transparent balloon to hold the dissolved xenon. The balloon, 1.7 meters in radius, 
would be shielded from external backgrounds by a large, very radio-pure liquid scintillator 
volume. While the energy resolution at Qpn (about 10%) is inferior to that of SNO+, the 
detection efficiency is much better (80%) due to its double envelope. The double envelope 
design is also responsible for the low expected background rate, 2 x 10 -4 counts/(keV • kg • y) 
in the R scenario, with a further improvement by a factor of 2 in the O scenario. 

The fourth and last category is that of Tracko-Calo experiments, where foils of 
source are surrounded by a tracking detector that provides a direct detection of the two 
electron tracks emitted in the decay. The quintessential example of this technique is the 
Super NEMO experiment. 

SuperNEMO [58-60] is a series of modules, each one consisting of a tracker and a 
calorimeter that surround a thin foil of the f3f3 isotope. In SuperNEMO the target will likely 
be 82 Se, although other isotopes such as 150 Nd or 48 Ca are also being considered. The mass 
of the target is limited to a few kg (typically 5 to 7) by the need to build it foil-like, and 
to minimize multiple scattering and energy loss. The tracker and calorimeter can record 
the trajectory of the charged particles and measure their energies independently. As shown 
by the successful NEMO-3 experiment [61-63], this technique, which exploits maximally 
the topological signature of the events, leads to excellent background rejection. Moreover, 
Tracko-Calo experiments allow for the determination of the f3/30v decay mechanism as the 
individual energies and trajectories of both electrons are measured [64]. In exchange, the 
selection efficiency is — like in the case of the NEXT experiment — relatively low (about 
30%), and the resolution rather modest (4% FWHM at Qpp). This technique is very hard 
to extrapolate to large masses due to the size, complexity and cost of each module. For this 
reason, we consider that the target mass of the SuperNEMO collaboration (100 kg, or about 
20 modules) will only be achieved in the optimistic scenario. For the reference scenario we 
consider 7 kg. This is the isotope mass for the demonstrator module that the Collaboration 
expects to commission in 2013. 
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Figure 7. Background rate in the ROI plotted versus the energy resolution (FWHM) for different 
past and present experiments. The dashed lines delimit bands where the experiments falling inside 
have background rates per unit of exposure — counts/ (kg-year) — of the same order of magnitude. 
(Figure adapted from [65].) 



5.2 Validity of background rate assumptions 

How justified are the assumptions of the different experiments regarding their expected level 
of background suppression? Figure 7 offers some clues. It shows the background rate in 
the ROI — in counts / (keV • kg • y) — versus the energy resolution (FWHM) of several past 
and present experiments. The (green) circles correspond to measured data, while the (blue) 
squares and (red) diamonds correspond, respectively, to the R and O background assumptions 
of the experiments discussed in the previous section. The dashed lines delimit bands where 
the experiments falling inside have background rates per unit of exposure — counts/ (kg-y) 
— of the same order of magnitude. 

The best background rates obtained so far are rather modest if one considers the am- 
bitions of the collaborations. Even in the case that we call reference, achieving the expected 
performance poses a major experimental challenge. 

The GERDA experiment expects to reduce the background rate achieved in Heidelberg- 
Moscow — 0.06 counts/(keV - kg-y) after pulse-shape discrimination — by at least a factor 
of 6 (phase I), and up to a factor of 60 (phase II). This improvement comes from a better 
shielding and improved radiopurity, and from active background suppression techniques such 
as rejection of multi-site depositions [66-68]. 

CUORE aims at achieving a background rate of about 4 x 10~ 2 counts/(keV -kg-y). 
A previous effort, the CUORICINO detector [69], a tower of bolometers that accumulated 
an exposure of 11.83 kg • years, measured a background rate of 0.6 counts/(keV • kg • y). 
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More recent tests within the CUORE Collaboration have measured about 0.2 counts/(keV • 
kg • y) [45]. Further improvements in the background rate would require, probably, active 
suppression methods such as those proposed by the LUCIFER project [2]. Such techniques 
may permit a reduction of the background rate down to 10 -3 counts/ (keV • kg • y) or better. 

The xenon time-projection chambers have a precedent in the Gotthard experiment [70] , 
a small xenon TPC (~5 kg) operated at 5 bar that obtained a background rate in the ROI of 
about 0.02 counts/ (keV-kg-y) using event topological information to reject backgrounds. The 
NEXT experiment expects a factor of 100 improvement over Gotthard due to an upgraded 
detection technique: the Gotthard TPC suffered of modest energy resolution — 6.6% FWHM 
at the Q- value — probably due to the addition of methane to the xenon (in order to increase 
the drift velocity and to suppress diffusion), which quenched the scintillation and ionization 
signals. NEXT aims at achieving energy resolution better than 1% FWHM at Qpp, enough 
to separate the (3(30v signal peak from the Compton spectrum of 208 T1, the main background 
source in Gotthard. Additionally, NEXT will measure the to of each event using the primary 
scintillation signal, a handle that Gotthard lacked. This permits the definition of a fiducial 
volume that eliminates any charged track emanating from detector surfaces. EXO projects 
a background level of 10~ 3 counts/(keV • kg • y), thanks to their good self-shielding. This 
background rate is a factor 5 higher than that of NEXT and compatible with the availability 
of a topological signature in the pressurized gas, not present in the liquid phase. 

SNO+ projects a background level similar to that of EXO, also due to good self- 
shielding. KamLAND-Zen, however, expects a factor of 5 lower rate than both EXO and 
SNO+ and at the same level of NEXT and SuperNEMO, in spite of lacking a topological 
signature. The advantage over SNO+ is the inner balloon that allows them to dissolve the 
xenon in a smaller volume of liquid scintillator. This reduces the fractional background due 
to the scintillator itself and improves the shielding. 

The NEMO-3 experiment has achieved a spectacular background rate of 3 x 10~ 3 
counts/ (keV • kg • y) using the topological signature of the events to discriminate signal from 
background. SuperNEMO foresees an improvement of up to a factor of 60 in the background 
rate with respect to its predecessor. This implies challenging requirements in the radiopu- 
rity of the source foil and in the level of radon inside the detector (the two main sources of 
background in NEMO-3), since the experimental technique is essentially the same. 

6 Sensitivity of the proposals 

We now turn to the results concerning the mpp sensitivity, as defined in Sec. 4, for the 
various /3/30f proposals discussed in Sec. 5. In Sec. 6.1, we show our results for the mpp 
sensitivity of the proposals as a function of exposure, for both our reference and optimistic 
scenarios. Section 6.2 illustrates how the sensitivity of the proposals is affected by changes 
in the background rate assumptions. We comment on the scalability to large isotope masses 
of the various proposals, and on their plausible rapp sensitivity after a common data-taking 
period of 10 years, in Sec. 6.3. The efficiency of each experiment, as well as the overall 
efficiency for a ROI of 1 FWHM, is included in all the results. 

A caveat is in order. Although we believe that our analysis is robust and simple, it 
also has some limitations. On the one hand, the quoted sensitivities could still improve 
somewhat if an energy-dependent analysis is performed. However, this requires a detailed 
understanding of the energy distribution of the various backgrounds in each experiment, and 
such information is hard to find in the existing literature. On the other hand, adding system- 
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Figure 8. The mpp sensitivity (at 90% CL) as a function of exposure of the seven different /3/30f 
proposals considered. For each proposal, two detector performance scenarios are shown: (a) reference 
case, (b) optimistic case. For illustrative purposes, the filled circles indicate 10 years of run-time 
according to the reference and optimistic isotope mass assumptions in Table 2. 

atic errors in both the estimation of the background and the calculation of the efficiency will 
partially spoil the sensitivity. This last point may severely affect the sensitivity results, but 
it is very difficult to implement in a general study like ours, since the published information 
concerning systematic errors of the different proposals is very scarce. 

6.1 Sensitivity as a function of the exposure 

Figure 8 shows the mpp sensitivity (at 90% CL) as a function of exposure for the seven 
P/30u proposals reviewed. We consider here not only the impact of isotope choice, efficiency 
and energy resolution, but also the effect of the expected background rate in the R and O 
scenarios. For the sake of simplicity, let us focus here on a fixed exposure of 10 3 kg • y for all 
proposals. 

In the reference scenario and for the above-mentioned exposure, NEXT and CUORE 
have the best sensitivities, reaching 69 and 74 meV at 90% CL, respectively. KamLAND-Zen, 
EXO, GERDA and SuperNEMO follow, with sensitivities around 90 meV. SNO+ reaches 
a sensitivity 134 meV. Taking into account that these numbers represent a limit of maxi- 
mum exposure for most of the proposals, 1 ton-year, it follows that the "next-generation" 
experiments will, at best, explore the degenerate spectrum. 

In the optimistic scenario, the lower background regime for all experiments allows to 
obtain significantly better sensitivities for the same 10 3 kg • y exposure. In this case, the best 
sensitivity is obtained by CUORE (44 meV at 90% CL). NEXT, KamLAND-Zen, GERDA 
and SuperNEMO follow, with sensitivities around 60 meV. EXO and SNO+ reach sensitivities 
around 75 meV. 

6.2 Sensitivity as a function of background rate 

Table 3 gives the sensitivity of all the proposals (for an exposure of 1 ton-year) and for five 
different background rate assumptions. We can observe the following trends: 



- 18 - 



Experiment 
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r - io~ 2 


r — 1 n~ 3 
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r — 1 n -5 


CUURE 


93 


54 


O A 

34 




O A 

24 


EXO 


258 


147 


85 


51 


37 


GERDA 


155 


91 


58 


45 


42 


KamLAND-Zen 


314 


178 


101 


60 


39 


NEXT 


269 


154 


91 


61 


50 


SNO+ 


237 


134 


76 


45 


30 


SuperNEMO 


354 


201 


115 


69 


48 



Table 3. Large exposure mpp sensitivity (for Mt = 10 3 kg • y and at 90% CL) of the different fl[30v 
experiments in terms of the background rate (expressed in counts/ (keV • kg • y)). 

• In the background rate characterizing past (3f3Qv experiments, c ~ 10" 1 counts/(keV • 
kg • y), CUORE would be the experiment with best sensitivity, 93 meV at 90% CL. 

• With a one order of magnitude better background rate suppression, c = 10~ 2 counts/(keV- 
kg-y), GERDA would reach 91 meV at 90% CL and CUORE would almost fully explore 
the degenerate spectrum (down to 54 meV at 90% CL). 

• For background rates at the 10~ 3 counts/(keV - kg-y) level, CUORE would partially 
probe the inverted hierarchy (rapp ~ 34 meV), GERDA would almost fully explore the 
degenerate spectrum, while EXO, NEXT and SNO+, would have a sensitivity better 
than 100 meV. 

• For even lower background rates, c = 10~ 4 counts/ (keV-kg-y), NEXT, EXO, KamLAND- 
Zen and SuperNEMO would almost fully cover the degenerate spectrum; GERDA would 
probe a small part of the inverse hierarchy; and CUORE would cover it almost com- 
pletely. 

• In the very low background rate limit of c = 10 -5 counts/ (keV -kg-y), one can see that 
the improvement in sensitivity of both CUORE and GERDA is very small, indicating 
that these experiments have reached the "background-free" regime for the considered 
exposure. NEXT, EXO, KamLAND-Zen and SuperNEMO, instead, keep improving 
their sensitivity, and explore partially the inverse hierarchy. 

6.3 Scalability to large isotope masses 

Having considered rriRR sensitivity as a function of exposure in kg • y, we consider now the 
mass scalability of the different proposals. 

Germanium experiments such as GERDA and MAJORANA are easier to scale, given 
their compactness. While masses of the order of 100 kg appear possible, it may be difficult, 
in particular for economical reasons, to go much beyond that. CUORE aims already at a 
large mass, and can, presumably, scale up at least by a factor two if enrichment at large scale 
is feasible and the backgrounds can be kept under control. 

It will be very difficult to scale up SuperNEMO. One is faced here with the difficulty 
of replicating modules (about 20 are needed), high cost, and the need of a very large under- 
ground volume to host the modules and their shielding. The technique is clearly unsuitable for 
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18 
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34 
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SNO+ 


R 



159 
50 


217 
69 


136 
43 






118 
37 


SuperNEMO 


R 



208 
63 


171 

52 


139 
42 


274 
83 


158 
48 


174 
53 



Table 4. Sensitivity of the different proposals at 90% CL in the R and O case, taking into account 
also the projected isotope mass that can be collected by the different proposals and for a 10 years 
data-taking period. All sensitivity numbers refer to mpp values in meV units. The PMR column 
relies on our estimate for NME most probable values, while subsequent columns use published NME 
results (see Sec. 2 for details). 

large masses, and reaching the collaboration's target mass of 100 kg appears as a formidable 
challenge. 

The scalability of SNO+ depends on the feasibility of enriching neodymium, most likely 
a difficult enterprise. In case of success, they could multiply their mass by a factor 10. 

All the xenon-based experiments can, in principle, use one ton of enriched xenon or more 
for the next-to-next generation. Notice that 700 kg of 136 Xe have already been acquired by 
the 3 collaborations. Clearly the best isotope among the ones considered here, as far as 
scalability to large (3j3 isotope masses is concerned, appears to be 136 Xe. 

Table 4 summarizes our mpp sensitivity findings if we further assume the reference and 
optimistic values for the (3(3 isotope masses as given in Tab. 2, and for a data-taking period 
of 10 years, common to all proposals. In the reference scenario, CUORE, EXO, KamLAND- 
Zen and NEXT cover most of the degenerate spectrum, with CUORE and KamLAND-Zen 
reaching the best sensitivity {map ~ 62 meV at 90% CL assuming our PMR estimate for 
NME most probable values). In the optimistic scenario, these four experiments cover part of 
the inverted hierarchy, with NEXT reaching the best sensitivity {mpp ~ 27 meV at 90% CL). 
Even taking into account the uncertainties associated to this calculation, it appears that the 
scalability to large masses and expected low background of the xenon experiments give them 
an advantage over other experimental approaches, with the possible exception of CUORE. 

For completeness, Tab. 4 also shows how mpp sensitivity results are affected by different 
assumptions regarding NME values. When available, ISM NME values [10, 11] give mpp 
sensitivity results that are about 30% worse than what obtained from our estimate for NME 
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most probable values (PMR column in Tab. 4). By construction, non-ISM NME values 
[12-18] tend to give mpp sensitivity results that are better than the PMR ones by a similar 
relative amount. 

7 Conclusions 

The answer to the question of the neutrino particle/antiparticle nature has become a critical 
issue after the observation of neutrino oscillations. Not only it would improve our under- 
standing of this intriguing fermion, but it would also have an enormous impact in many other 
areas of fundamental physics. 

Neutrinoless double beta decay (/3/30v) is possible if and only if neutrinos are massive 
Majorana particles. The observation of such a process is considered the most promising 
strategy to demonstrate the Majorana nature of the neutrino. Furthermore, the measurement 
of the lifetime for this process would provide direct information on the absolute scale of 
neutrino masses. 

An obvious goal for the next-generation experiments is to push the current limits down 
to Majorana masses mpp of at least 100 meV to unambiguously confirm or discard the 
evidence for /3/30u claimed by part of the Heidelberg-Moscow Collaboration [7], with sufficient 
statistics. This appears to be within the range of all the proposals. A more ambitious goal 
is to fully explore the degenerate spectrum, down to mpp ^ 50 meV. None of the proposals 
appears quite capable of this, under the assumptions discussed in this work (the reference 
scenario, 90% confidence level, data-taking period of 10 years), although CUORE {mpp ~ 
62 meV), KamLAND-Zen (mpp 62 meV), NEXT (mpp 71 meV) and EXO (mpp 79 
meV) end up quite close. 

The optimistic scenario described in the text may be implemented by the next-to-next 
generation of experiments. The goal here would be to at least partially explore the inverse 
hierarchy. This seems to be within the reach of NEXT (mpp — 27 meV), CUORE (mpp ~ 
30 meV), KamLAND-Zen (nipp ^ 32 meV) and perhaps EXO (mpp ~ 41 meV). 

More generally, the suitability of SNO+, GERDA and SuperNEMO for a next-to-next 
round of experiments is essentially dependent upon their capability to scale the technology 
to large masses. Scalability appears easier for CUORE, in particular if the techniques be- 
ing developed in the context of the LUCIFER Collaboration (see for example [2]) permit 
large background rate reductions. Finally it appears that 136 Xe would be a a particularly 
favorable isotope to use, since it permits target masses of 1 ton or more and low-background 
experimental techniques such as the ones proposed by NEXT, KamLAND-Zen and EXO. 

Last but not least, while we believe that the general trends observed in the sensitivity 
comparisons make sense, one should take the numbers presented here cum grano salis. There 
are many uncertainties, ranging from the precise values of the nuclear matrix elements to 
the ultimate background rate and isotope mass that can be achieved by the different experi- 
mental techniques. Nevertheless, we suggest that /3/30f collaborations adopt the simple and 
unambiguous procedure we have described in this work to derive mpp sensitivities, in order 
to allow for more meaningful physics case comparisons. 
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A Construction of unified-approach confidence intervals 

For completeness and for pedagogical purposes, in this appendix we describe in some detail 
how to construct a confidence interval for a signal mean to be inferred from observations, 
using the unified approach by Feldman and Cousins [8] and in the case of a Poisson process 
with background. In this case, the relevant pdf to observe n events given a (unknown) signal 
mean /i and a (known) background mean b is given by the Poisson distribution: 

Po(n; /x + b) = (M+ , b)n e~^ +b \ (A.l) 

Just as can be done for classical confidence intervals (that is, the one-sided upper confi- 
dence limit and the two-sided central confidence interval discussed in Sees. 4.1 and 4.2), the 
unified approach uses the Neyman construction of confidence belts [35]. In this construction, 
given the known mean background expectation b and for each value of the unknown signal 
mean //, one selects an interval in n such that: 

Po(n; fj, + b) > CL (A.2) 

n=ni 

where CL is the desired confidence level, for example CL = 0.90. Once this exercise is 
repeated for all possible values of /U and once the experiment's outcome n Q ^ s is known, the 
confidence interval [/xi ,jU U p] for the signal mean \x can be extracted. 

However, while the acceptance interval n G [711,712] is constrained by Eq. A.2, it is 
not fully specified by it. The freedom on how to specify such intervals marks the difference 
between the classical and the unified approach confidence belt construction. In the unified 
approach method, such acceptance intervals at fixed fi are determined based on a ordering 
principle for the n values based on likelihood ratios, as will be shown below. The entire 
procedure can be summarized as follows: 

1. Compute the mean background expectation, b. Suppose that, in our case, b = 1.0. 
Note that b does not need to be an integer value, in general. 

2. For the given mean background expectation b and for each possible measurement out- 
come n, compute the best estimator ^best for the true value of the mean signal yield fj,. 
If \jl were unconstrained, the best estimator [ibest would simply be found by maximizing 
the Poisson probability for any given b and n: 



dPo(n; fi + b) 



dyU 



= //best =n-6. (A.3) 

n,b 



However, we know that only non-negative values for /j, are physically allowed, so that 
our best estimator is given by: 

/ibest = max(0, n-b) (A.4) 
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Figure 9. For a given mean background expectation b = 1.0, Po(n; //best + fr) as a function of all 
possible measurement outcomes n is shown. For each n, the value //best is also shown in the figure, 



In the equations above, the measurement outcome n can take any non-negative integer 
value, while //best can take any non-negative value. The values of //best f° r b = 1.0 and 
for each possible measurement outcome n are reported in Tab. 5. 



n 


//best 


Po(n; //best + b) 





0.0 


0.368 


1 


0.0 


0.368 


2 


1.0 


0.271 


3 


2.0 


0.224 


4 


3.0 


0.195 


5 


4.0 


0.175 



Table 5. For a given mean background expectation b = 1.0, values for the signal best estimator //best 
and for Po(n; //best + b) as a function of all possible measurement outcomes n. 



For the given mean background expectation b and for each possible measurement out- 
come n, compute the likelihood Po(n; //best + b) of obtaining n given the best-fit phys- 
ically allowed signal mean //best- This is reported in tabular form in Tab. 5, and in 
graphical form in Fig. 9. Explicitly, the function Po(n; //best + b) is equal to: 



Po(n . „ t ij\ J nT if n < b (=* //best = 0) 

Fo(n, //b es t + Oj - < _n _ n ., , , nX 

I - " if n > b (=> //best > 0) 



. n! e 



(A.5) 



4. We now consider a possible value for the unknown true signal mean //. For simplicity, 
we will show the computational details only for // = and for // = 1.0. Note that // 
does not need to be an integer value, in general. 

5. For the given mean background expectation b (fixed in step 1), and for the given true 
signal mean // (fixed in step 4), order the possible measurement outcomes n from most 
to least likely. This ordering is done according to the values of the likelihood ratio (also 
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Figure 10. The function Po(n; n + b) for a mean background expectation b = 1.0, and a signal mean 
H = (left) or /i = 1.0 (right). 



called profile likelihood) [71]: 



Po(n; fi + b) 
Po(n; /j best + 6) ' 



(A.6) 



where the likelihood Po(n; //best + b) was computed in step 3, while Po(n; // + b) is the 
likelihood of obtaining n for the given signal mean //. Explicitly, the function Cr is 
equal to: 



if n < 6 (=>• /Ubest = 0) 
if n > 6 (=> // bes t > 0) 



(A.7) 



The highest rank (1Z = 1) is assigned to the n value having the highest value of Cr. 

As examples, Po(n; \i + b) values for each n are given in the left panels of Tab. 6 and 
of Fig. 10 for [i = 0, and in the right panels of Tab. 6 and of Fig. 10 for ^ = 1.0. The 
corresponding likelihood ratios Cr and ranking of n values from most to least likely are 
also given in Tab. 6 for /x = and 1.0. The likelihood ratio distributions are also shown 
in graphical form in Fig. 11, again for = (left) and /j = 1.0 (right). Note that Cr is 
monotonically decreasing with n for /i = 0, but this is not the case for [i = 1.0, where 
n = 2 has the maximum rank. 



n 


Po(n; fJ, + b) 


Cr 


n 


V 


n 


Po(n; fi + b) 


Cr 


n 


V 





0.368 


1.000 


i 


0.368 





0.135 


0.368 


5 


0.947 


1 


0.368 


1.000 


2 


0.736 


1 


0.271 


0.736 


3 


0.722 


2 


0.184 


0.680 


3 


0.920 


2 


0.271 


1.000 


1 


0.271 


3 


0.061 


0.274 






3 


0.180 


0.805 


2 


0.451 


4 


0.015 


0.078 






4 


0.090 


0.462 


4 


0.812 


5 


0.003 


0.017 






5 


0.036 


0.206 







Table 6. For a given mean background expectation b = 1.0 and mean signal /i = (left table) and 
fi = 1.0 (right table), ranking 1Z of the n values according to the likelihood ratio Cr = Po(n; fi + 
b)/Po(n: /Ubest + b), and confidence interval construction in n until a cumulative probability V = 0.90 
is reached. See text for details. 
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Figure 11. The function Cr = Po(n; /i + 6)/Po(n; [ibest + b) for /x — (left) and [i = 1.0 (right). 



6. For a given 6 and /x, construct a confidence interval in n by adding the Po(n; /i + 6) 
values until the cumulative probability is at least as large as the desired confidence level. 
In this confidence interval construction, the Po(n; /i + b) values are added according to 
their ranking (as defined in step 5), starting from the most likely n value (1Z =1). As 
examples, Tab. 6 (left) shows that only three n values are necessary to reach a 90% CL 
for the fi = case. The three values are added in this order: n = 0, 1, 2. The resulting 
confidence interval is therefore n G [0,2]. Similarly, Tab. 6 (right) shows that five n 
values are needed to reach a 90% CL: n = 2, 3, 1,4, 0, resulting in the interval n G [0, 4]. 
The larger the desired confidence level is, the larger is the corresponding n interval. 



7. Repeat steps 4-6 for different values of \i. In this scan over all possible values of the 
unknown true signal mean /i, one typically starts from /x = 0, and gradually increases 
the /i values according to some /x step size. Even though we showed the cases /i = 
and /i = 1.0 only, the step size is often chosen to be much smaller than unity when 
small signals are searched for. 

Once the n confidence intervals have been obtained for each /x, the confidence belt con- 
struction is complete for the known mean background expectation b and for the desired 
confidence level (90% CL, in our case). Note that this confidence belt construction does 
not require any knowledge of the actual experiment's outcome, n b s - 

8. Now perform the measurement. Suppose that the result is n b s = 1, that is, compatible 
with the background-only mean expectation b = 1.0. The left panel in Fig. 12 shows 
a vertical line corresponding to this value. The intercepts of the confidence belts with 
the n Q bs = 1 vertical line fixes the allowed range in the true signal mean, \i G [/xi Q , /"up]- 
Given that the observable n is discrete, one more prescription is needed in this case 
to fully specify the range /i G [/Uioi/^up]- When several values of /i yield the same 
acceptance interval n G [ni,n2] in Eq. A. 2, the constant n Q b s vertical line does not 
intersect the belts in only two points, but rather along two vertical segments. When 
this occurs, we conservatively take the confidence interval to have fi\ Q corresponding to 
the smallest value of [i with ri2 = n b s , and /i up as the largest value of \i with n\ = n b s . 
As can be read from the left panel in Fig. 12, in this case the range is fi G [0,3.4]. On 
the other hand, suppose that the experiment had measured a significant excess above 
the background-only prediction: n Q b s = 10. The right panel in Fig. 12 shows the /i 
range that would have been obtained in this case: [i G [4.5, 15.5]. 
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Figure 12. Confidence belts constructed for a Poisson process with mean background expectation 
b = 1.0, using the unified approach and for a 90% confidence level. The left (right) panel shows how 
to infer a range in the signal yield /i given a measurement outcome of n Q bs = 1(10). 



The unified approach procedure described above solves two problems of classical confi- 
dence belts, namely that: (a) a downward fluctuation in background can produce an empty 
set confidence level (see Sec. 4.2), and (b) using the measured result to decide whether to 
use a central or upper ordering principle leads to the wrong coverage (flip -flopping, see [8]). 
The unified approach never produces empty confidence intervals, and provides the correct 
frequentist coverage by avoiding to use the measurement outcome to construct the confi- 
dence belts. In this approach, if the lower bound obtained by the above procedure is strictly 
H\ = 0, then a upper limit is quoted; otherwise, if fi\ > 0, a central confidence interval 
is given. With these definitions of upper limit and central interval, and as can be seen in 
Fig. 12, this method smoothly transitions from an upper limit to a central interval for the 
signal fi as one moves from a null result (compatible with background-only expectation, left 
panel) to a non-null result (right panel). 
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